Support vector machine classifier for prediction of the metastasis of colorectal cancer
نویسندگان
چکیده
Colorectal cancer (CRC) is one of the most common cancers and a major cause of mortality. The present study aimed to identify potential biomarkers for CRC metastasis and uncover the mechanisms underlying the etiology of the disease. The five datasets GSE68468, GSE62321, GSE22834, GSE14297 and GSE6988 were utilized in the study, all of which contained metastatic and non-metastatic CRC samples. Among them, three datasets were integrated via meta-analysis to identify the differentially expressed genes (DEGs) between the two types of samples. A protein-protein interaction (PPI) network was constructed for these DEGs. Candidate genes were then selected by the support vector machine (SVM) classifier based on the betweenness centrality (BC) algorithm. A CRC dataset from The Cancer Genome Atlas database was used to evaluate the accuracy of the SVM classifier. Pathway enrichment analysis was carried out for the SVM-classified gene signatures. In total, 358 DEGs were identified by meta‑analysis. The top ten nodes in the PPI network with the highest BC values were selected, including cAMP responsive element binding protein 1 (CREB1), cullin 7 (CUL7) and signal sequence receptor 3 (SSR3). The optimal SVM classification model was established, which was able to precisely distinguish between the metastatic and non-metastatic samples. Based on this SVM classifier, 40 signature genes were identified, which were mainly enriched in protein processing in endoplasmic reticulum (e.g., SSR3), AMPK signaling pathway (e.g., CREB1) and ubiquitin mediated proteolysis (e.g., FBXO2, CUL7 and UBE2D3) pathways. In conclusion, the SVM-classified genes, including CREB1, CUL7 and SSR3, precisely distinguished the metastatic CRC samples from the non-metastatic ones. These genes have the potential to be used as biomarkers for the prognosis of metastatic CRC.
منابع مشابه
Fault diagnosis in a distillation column using a support vector machine based classifier
Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...
متن کاملApplication of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملSupport vector regression for prediction of gas reservoirs permeability
Reservoir permeability is a critical parameter for characterization of the hydrocarbon reservoirs. In fact, determination of permeability is a crucial task in reserve estimation, production and development. Traditional methods for permeability prediction are well log and core data analysis which are very expensive and time-consuming. Well log data is an alternative approach for prediction of pe...
متن کاملSupport Vector Machine Based Facies Classification Using Seismic Attributes in an Oil Field of Iran
Seismic facies analysis (SFA) aims to classify similar seismic traces based on amplitude, phase, frequency, and other seismic attributes. SFA has proven useful in interpreting seismic data, allowing significant information on subsurface geological structures to be extracted. While facies analysis has been widely investigated through unsupervised-classification-based studies, there are few cases...
متن کاملThe Application of Least Square Support Vector Machine as a Mathematical Algorithm for Diagnosing Drilling Effectivity in Shaly Formations
The problem of slow drilling in deep shale formations occurs worldwide causing significant expenses to the oil industry. Bit balling which is widely considered as the main cause of poor bit performance in shales, especially deep shales, is being drilled with water-based mud. Therefore, efforts have been made to develop a model to diagnose drilling effectivity. Hence, we arrived at graphical cor...
متن کاملApplication of Genetic Algorithm Based Support Vector Machine Model in Second Virial Coefficient Prediction of Pure Compounds
In this work, a Genetic Algorithm boosted Least Square Support Vector Machine model by a set of linear equations instead of a quadratic program, which is improved version of Support Vector Machine model, was used for estimation of 98 pure compounds second virial coefficient. Compounds were classified to the different groups. Finest parameters were obtained by Genetic Algorithm method ...
متن کامل